Early Experiences Writing Performance Portable OpenMP 4 Codes

نویسندگان

Verónica G. Vergara Larrea

Wayne Joubert

M. Graham Lopez

Oscar Hernandez

چکیده

In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilers to measure the performance variations of a simple application kernel when executed on the OLCF’s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code. Keywords-Performance Portable Programming Models; Shared Memory Programming; Accelerator Programming; OpenMP 4.0

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pragmatic Performance Portability with OpenMP 4.x. In OpenMP: Memory, Devices, and Tasks: 12th International Workshop on OpenMP, IWOMP

In this paper we investigate the current compiler technologies supporting OpenMP 4.x features targeting a range of devices, in particular, the Cray compiler 8.5.0 targeting an Intel Xeon Broadwell and NVIDIA K20x, IBM’s OpenMP 4.5 Clang branch (clang-ykt) targeting an NVIDIA K20x, the Intel compiler 16 targeting an Intel Xeon Phi Knights Landing, and GCC 6.1 targeting an AMD APU. We outline the...

متن کامل

Evaluation of Directive-based Performance Portable Programming Models

We present an extended exploration of the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architectures with attached accelerators, both self-hosted multicore and offload multicore/GPU. Our goal is to examine how successful OpenACC and the newer offload features of OpenMP 4.5 are for moving codes between architectures, and we document how ...

متن کامل

Extending OpenMP to Support Slipstream Execution Mode

OpenMP has emerged as a widely accepted standard for writing shared memory programs. Hardware-specific extensions such as data placement are usually needed to improve the scalability of applications based on this standard. This paper investigates the implementation of an OpenMP compiler that supports slipstream execution mode, a new optimization mechanism for CMP-based distributed shared memory...

متن کامل

Early Experiences with the OpenMP Accelerator Model

A recent trend in mainstream computer nodes is the combined use of general-purpose multicore processors and specialized accelerators such as GPUs and DSPs in order to achieve better performance and to reduce power consumption. To support this trend, the OpenMP Language Committee has approved a set of extensions to OpenMP (referred to as the OpenMP accelerator model). The initial version is the ...

متن کامل

Porting and performance evaluation of irregular codes using OpenMP

In the last two years, OpenMP has been gaining popularity as a standard for developing portable shared memory parallel programs. With the improvements in centralized shared memory technologies and the emergence of distributed shared memory (DSM) architectures, several medium-to-large physical and logical shared memory con gurations are now available. Thus, OpenMP stands to be a promising medium...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Early Experiences Writing Performance Portable OpenMP 4 Codes

نویسندگان

چکیده

منابع مشابه

Pragmatic Performance Portability with OpenMP 4.x. In OpenMP: Memory, Devices, and Tasks: 12th International Workshop on OpenMP, IWOMP

Evaluation of Directive-based Performance Portable Programming Models

Extending OpenMP to Support Slipstream Execution Mode

Early Experiences with the OpenMP Accelerator Model

Porting and performance evaluation of irregular codes using OpenMP

عنوان ژورنال:

اشتراک گذاری